Search CORE

25 research outputs found

Self-Edit: Fault-Aware Code Editor for Code Generation

Author: Jin Zhi
Li Ge
Li Jia
Li Zhuo
Zhang Kechi
Publication venue
Publication date: 05/06/2023
Field of study

Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human programming, we propose a generate-and-edit approach named Self-Edit that utilizes execution results of the generated code from LLMs to improve the code quality on the competitive programming task. We execute the generated code on the example test case provided in the question and wrap execution results into a supplementary comment. Utilizing this comment as guidance, our fault-aware code editor is employed to correct errors in the generated code. We perform extensive evaluations across two competitive programming datasets with nine different LLMs. Compared to directly generating from LLMs, our approach can improve the average of pass@1 by 89\% on APPS-dev, 31\% on APPS-test, and 48\% on HumanEval over nine popular code generation LLMs with parameter sizes ranging from 110M to 175B. Compared to other post-processing methods, our method demonstrates superior accuracy and efficiency.Comment: Accepted by ACL202

arXiv.org e-Print Archive

Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

Author: Jin Zhi
Li Ge
Li Zhuo
Zhang Kechi
Publication venue
Publication date: 14/03/2023
Field of study

Source code representation with deep learning techniques is an important research field. There have been many studies that learn sequential or structural information for code representation. But sequence-based models and non-sequence-models both have their limitations. Researchers attempt to incorporate structural information to sequence-based models, but they only mine part of token-level hierarchical structure information. In this paper, we analyze how the complete hierarchical structure influences the tokens in code sequences and abstract this influence as a property of code tokens called hierarchical embedding. The hierarchical embedding is further divided into statement-level global hierarchy and token-level local hierarchy. Furthermore, we propose the Hierarchy Transformer (HiT), a simple but effective sequence model to incorporate the complete hierarchical embeddings of source code into a Transformer model. We demonstrate the effectiveness of hierarchical embedding on learning code structure with an experiment on variable scope detection task. Further evaluation shows that HiT outperforms SOTA baseline models and show stable training efficiency on three source code-related tasks involving classification and generation tasks across 8 different datasets.Comment: Accepted by ICPC 202

arXiv.org e-Print Archive

Exploring the metabolic network of the epidemic pathogen Burkholderia cenocepacia J2315 via genome-scale reconstruction

Author: Chang Suhua
Fang Kechi
Godinho Miguel
Lam Carolyn M C
Martins dos Santos Vítor A P
Panda Gurudutta
Sun Changyue
Wang Jing
Zhang Kunlin
Zhao Hansheng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background <it>Burkholderia cenocepacia </it>is a threatening nosocomial epidemic pathogen in patients with cystic fibrosis (CF) or a compromised immune system. Its high level of antibiotic resistance is an increasing concern in treatments against its infection. Strain <it>B. cenocepacia </it>J2315 is the most infectious isolate from CF patients. There is a strong demand to reconstruct a genome-scale metabolic network of <it>B. cenocepacia </it>J2315 to systematically analyze its metabolic capabilities and its virulence traits, and to search for potential clinical therapy targets. Results We reconstructed the genome-scale metabolic network of <it>B. cenocepacia </it>J2315. An iterative reconstruction process led to the establishment of a robust model, <it>i</it>KF1028, which accounts for 1,028 genes, 859 internal reactions, and 834 metabolites. The model <it>i</it>KF1028 captures important metabolic capabilities of <it>B. cenocepacia </it>J2315 with a particular focus on the biosyntheses of key metabolic virulence factors to assist in understanding the mechanism of disease infection and identifying potential drug targets. The model was tested through BIOLOG assays. Based on the model, the genome annotation of <it>B. cenocepacia </it>J2315 was refined and 24 genes were properly re-annotated. Gene and enzyme essentiality were analyzed to provide further insights into the genome function and architecture. A total of 45 essential enzymes were identified as potential therapeutic targets. Conclusions As the first genome-scale metabolic network of <it>B. cenocepacia </it>J2315, <it>i</it>KF1028 allows a systematic study of the metabolic properties of <it>B. cenocepacia </it>and its key metabolic virulence factors affecting the CF community. The model can be used as a discovery tool to design novel drugs against diseases caused by this notorious pathogen.</p

Helmholtz Zentrum für Infektionsforschung Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences

Wageningen University & Research Publications

In Silico Insights into the Symbiotic Nitrogen Fixation in Sinorhizobium meliloti via Metabolic Reconstruction

BACKGROUND: Sinorhizobium meliloti is a soil bacterium, known for its capability to establish symbiotic nitrogen fixation (SNF) with leguminous plants such as alfalfa. S. meliloti 1021 is the most extensively studied strain to understand the mechanism of SNF and further to study the legume-microbe interaction. In order to provide insight into the metabolic characteristics underlying the SNF mechanism of S. meliloti 1021, there is an increasing demand to reconstruct a metabolic network for the stage of SNF in S. meliloti 1021. RESULTS: Through an iterative reconstruction process, a metabolic network during the stage of SNF in S. meliloti 1021 was presented, named as iHZ565, which accounts for 565 genes, 503 internal reactions, and 522 metabolites. Subjected to a novelly defined objective function, the in silico predicted flux distribution was highly consistent with the in vivo evidences reported previously, which proves the robustness of the model. Based on the model, refinement of genome annotation of S. meliloti 1021 was performed and 15 genes were re-annotated properly. There were 19.8% (112) of the 565 metabolic genes included in iHZ565 predicted to be essential for efficient SNF in bacteroids under the in silico microaerobic and nutrient sharing condition. CONCLUSIONS: As the first metabolic network during the stage of SNF in S. meliloti 1021, the manually curated model iHZ565 provides an overview of the major metabolic properties of the SNF bioprocess in S. meliloti 1021. The predicted SNF-required essential genes will facilitate understanding of the key functions in SNF and help identify key genes and design experiments for further validation. The model iHZ565 can be used as a knowledge-based framework for better understanding the symbiotic relationship between rhizobia and legumes, ultimately, uncovering the mechanism of nitrogen fixation in bacteroids and providing new strategies to efficiently improve biological nitrogen fixation

Crossref

Directory of Open Access Journals

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences

FigShare

Network-assisted analysis of primary Sjogren's syndrome GWAS data in Han Chinese

Author: Fang Kechi
Wang Jing
Zhang Kunlin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/12/2015
Field of study

Primary Sjogren's syndrome (pSS) is a complex autoimmune disorder. So far, genetic research in pSS has lagged far behind and the underlying biological mechanism is unclear. Further exploring existing genome-wide association study (GWAS) data is urgently expected to uncover disease-related gene combination patterns. Herein, we conducted a network-based analysis by integrating pSS GWAS in Han Chinese with a protein-protein interactions network to identify pSS candidate genes. After module detection and evaluation, 8 dense modules covering 40 genes were obtained for further functional annotation. Additional 31 MHC genes with significant gene-level P-values (sigMHC-gene) were also remained. The combined module genes and sigMHC-genes, a total of 71 genes, were denoted as pSS candidate genes. Of these pSS candidates, 14 genes had been reported to be associated with any of pSS, RA, and SLE, including STAT4, GTF2I, HLA-DPB1, HLA-DRB1, PTTG1, HLA-DQB1, MBL2, TAP2, CFLAR, NFKBIE, HLA-DRA, APOM, HLA-DQA2 and NOTCH4. This is the first report of the network-assisted analysis for pSS GWAS data to explore combined gene patterns associated with pSS. Our study suggests that network-assisted analysis is a useful approach to gaining further insights into the biology of associated genes and providing important clues for future research into pSS etiology

PubMed Central

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Network-Based Analysis of Schizophrenia Genome-Wide Association Data to Detect the Joint Functional Association Signals.

Author: Jing Wang
Kechi Fang
Kunlin Zhang
Suhua Chang
Publication venue: Public Library of Science (PLoS)
Publication date: 20/07/2015
Field of study

Schizophrenia is a common psychiatric disorder with high heritability and complex genetic architecture. Genome-wide association studies (GWAS) have identified several significant loci associated with schizophrenia. However, the explained heritability is still low. Growing evidence has shown schizophrenia is attributable to multiple genes with moderate effects. In-depth mining and integration of GWAS data is urgently expected to uncover disease-related gene combination patterns. Network-based analysis is a promising strategy to better interpret GWAS to identify disease-related network modules. We performed a network-based analysis on three independent schizophrenia GWASs by using a refined analysis framework, which included a more accurate gene P-value calculation, dynamic network module searching algorithm and detailed functional analysis for the obtained modules genes. The result generated 79 modules including 238 genes, which form a highly connected subnetwork with more statistical significance than expected by chance. The result validated several reported disease genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14, MYH9 and FXYD6, and also obtained several novel candidate genes and gene-gene interactions. Pathway enrichment analysis of the module genes suggested they were enriched in several neural and immune system related pathways/GO terms, such as neurotrophin signaling pathway, synaptosome, regulation of protein ubiquitination, and antigen processing and presentation. Further crosstalk analysis revealed these pathways/GO terms were cooperated with each other, and identified several important genes, which might play vital roles to connect these functions. Our network-based analysis of schizophrenia GWASs will facilitate the understanding of genetic mechanisms of schizophrenia

Directory of Open Access Journals

PubMed Central

Institutional Repository of Institute of Psychology, Chinese Academy of Sciences

Consistency Analysis of Large-scale Energy Storage Batteries

Author: Kechi Chen
Pengcheng Zhou
Qianzi Lu
Xueliang Ping
Yuling Zhang
Publication venue: 'EDP Sciences'
Publication date: 27/06/2022
Field of study

With the development of large-scale electrochemical energy storage power stations, lithium-ion batteries have unique advantages in terms of re-energy density, power density, and cycle life, and are applied to power system energy storage devices. However, behind the rapid development, there are many key issues unanswered, which are likely to lead to various safety accidents. Therefore, it is very important to conduct consistency analysis of lithium batteries used in large-scale power systems to prepare for system safety assessment. This paper mainly explains the reasons and manifestations of the inconsistency, and based on data mining algorithms, uses the charging voltage curve clustering analysis method based on subtractive clustering to evaluate the consistency of lithium-ion batteries

EDP Sciences OAI-PMH repository (1.2.0)

ToolCoder: Teach Code Generation Models to use API search tools

Author: Jin Zhi
Li Ge
Li Jia
Li Zhuo
Zhang Kechi
Publication venue
Publication date: 09/05/2023
Field of study

Automatically generating source code from natural language descriptions has been a growing field of research in recent years. However, current large-scale code generation models often encounter difficulties when selecting appropriate APIs for specific contexts. These models may generate APIs that do not meet requirements or refer to non-existent APIs in third-party libraries, especially for lesser-known or private libraries. Inspired by the process of human developers using tools to search APIs, we propose ToolCoder, a novel approach that integrates API search tools with existing models to assist in code generation and API selection. To teach our model to use tools, we introduce an automated data annotation method using ChatGPT to add tool usage information into the source code data and fine-tune code generation models. During inference, we integrate API search tools into the generation process so that our model can automatically use the search tool to get suggestions when selecting an API. Our experimental results demonstrate that ToolCoder exhibits excellent performance and generalization across five public and private library code generation benchmarks, with at least 6.21\% improvement on average pass@1 metrics and 9.64\% improvement on average pass@10 metrics compared to state-of-the-art methods. Furthermore, we show that our relatively small ToolCoder model is comparable to one of the current best models, GPT-3.5, highlighting the potential of incorporating programming tools into the code generation process

arXiv.org e-Print Archive

Protein-protein interaction network involving all merged module genes.

Author: Jing Wang (4865)
Kechi Fang (185614)
Kunlin Zhang (255390)
Suhua Chang (43130)
Publication venue
Publication date
Field of study

Square nodes denote the reported genes associated with schizophrenia or bipolar disorder. The color of the node was proportioned with the P-value of gene. The width of the edge was proportioned with the No. of repeats of the edge in the modules. The purple edges, green edges and blue edges were interactions from MGS, Affy6 and Affy500K respectively.</p

FigShare